Who should be the Most Valuable Player(MVP) of the NBA 2021-2022 season other than Nikola Jokic. For the sake of simplicity, MVP will be used in the dataset for easy reading.
The data sources I am using is from the NBA website, data source consideration will be from kaggle.
When it comes to awards, especially in sports, there are many questions and debates as to who truly deserve it. The only time it was more or less unanimous was when it comes to Kareem Abdul Jabbar or Michael Jordan. In the NBA, this debate leads to a lot of controversies and it spreads to many other categories as well in the NBA, such as the Defensive Player of the Year and 6th Man of the Year. In this research, we will analyse and observe if Nikola Jokic should be the MVP for the 2021-2022 NBA season (not regarding Playoffs).
Candidates for the Regular Season MVP is in this article: https://www.nba.com/news/kia-mvp-ladder-april-15-edition
Origin of data: The NBA traditional stats page, it will be scraped and used for comparing the leading candidates in consideration for the NBA MVP award.
Format: HTML table
This dataset will be the main focus for this project and has all the information that I require to compare the candidates. The dataset consists of a table with all the players playing in the NBA 2021-2022 Season and arranged by points per game. Points per game is preferrable as it shows the MVP candidates easily at the top 15 easily. I will be web scraping to retrieve the data via the website.
Origin of data: The second data source is from the NBA Teams traditional stats page, I will scrape this to show this player affects the team in terms of winning.
Format: HTML table
This dataset is to get the information of the teams and its winning percentages. As well as the plus/minus of the teams. I will be getting this from the NBA website as well.
Origin of data: I found another dataset on kaggle with similar findings but focuses mainly on the stats for Kobe Bryant, Michael Jordan and Lebron James. This kaggle dataset is being considered because these 3 players are considered to be the greatest who have ever played the game.The strengths of this datasource is that it provides me with the opportunity to compare the current MVP to the greats of the game, especially when there are players that win multiple MVP awards.
Format: CSV
I followed the NBA for about 5 years now and personally I felt there were some players that were snubbed for the MVP award. For example, I personally think the 2019-2020 NBA Season MVP should be Lebron James. I think Giannis is great in the 2019-2020 NBA Season but the team got eliminated in the playoffs while the Lakers dominated to the championship. I personally think this 2021-2022 NBA MVP should be Stephen Curry but we shall see as I try to analyse the data.
MVP topics have been discussed before but not in a specific season. Usually the research revolves around the more popular NBA legends like Michael Jordan or which team is the best team and their best players. I found an analysis that revolves around the age and how well they play with each NBA season which I thought was very interesting. There are discussions on current players being compared to the NBA legends, to determine if this particular player is going to be a legend in the making. Other than the mentioned sources, I personally have not found any analysis in regards to the current season and its MVP.
I will analyse the variables and provide a glossary of the terms used in the NBA for clarity. The
The steps are as follows:
1) I will get the data via web scraping at the nba website.
2) Next I will get only the NBA candidates and their respectives teams. I will compare them from there as well as how well they do. I will be removing some columns in the tables such as fantasy points and triple doubles in the game.
3) Finally, analyse the data by plotting graphs.
I hope to do so by analysing the stats of the 15 MVP candidates, the one that wheels the team and himself to victory should be the MVP.
| Term | Explanation |
|---|---|
| GP | Games Played |
| W | Wins |
| W_PCT | Winning Percentage |
| REB | Rebounds |
| AST | Assists |
| TOV | Turnovers |
| PTS | Points |
| W_PCT_RANK | Winning Percentage Ranking |
| PLUS_MINUS_RANK | Plus Minus Ranking |
pip install plotly==5.10.0
Requirement already satisfied: plotly==5.10.0 in c:\anaconda\lib\site-packages (5.10.0) Requirement already satisfied: tenacity>=6.2.0 in c:\anaconda\lib\site-packages (from plotly==5.10.0) (8.0.1) Note: you may need to restart the kernel to use updated packages.
import requests
from bs4 import BeautifulSoup
import re
import pandas as pd
import numpy as np
import session_info
import plotly.express as px
This is the URLs for all the NBA players and all the NBA teams.
all_nba_players_url = 'https://stats.nba.com/stats/leaguedashplayerstats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&GameSegment=&Height=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2021-22&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&TwoWay=0&VsConference=&VsDivision=&Weight='
nba_teams_url = 'https://stats.nba.com/stats/leaguedashteamstats?Conference=&DateFrom=&DateTo=&Division=&GameScope=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2021-22&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&TwoWay=0&VsConference=&VsDivision='
I learnt of this way of extracting the NBA stats on youtube, I am crediting the link here and as well as below as reference used. I will also create a reference list below for this.
https://github.com/rd11490/NBA_Tutorials/tree/master/finding_endpoints
headers = {
'Connection': 'keep-alive',
'Accept': 'application/json, text/plain, */*',
'x-nba-stats-token': 'true',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36',
'x-nba-stats-origin': 'stats',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-Mode': 'cors',
'Referer': 'https://stats.nba.com/',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
}
We then save the requests to a response variable so we can access the specific json data. I made one each for players and teams.
players_response = requests.get(url=all_nba_players_url, headers=headers).json()
teams_response = requests.get(url=nba_teams_url, headers=headers).json()
all_nba_players = players_response['resultSets'][0]['rowSet']
nba_teams = teams_response['resultSets'][0]['rowSet']
The columns list consists of all the variable names of the columns. This is what determines the headers for the stats.
#Below are the columns of the NBA stats page for both players and teams
players_columns_list = [
"PLAYER_ID",
"PLAYER_NAME",
"NICKNAME",
"TEAM_ID",
"TEAM_ABBREVIATION",
"AGE",
"GP",
"W",
"L",
"W_PCT",
"MIN",
"FGM",
"FGA",
"FG_PCT",
"FG3M",
"FG3A",
"FG3_PCT",
"FTM",
"FTA",
"FT_PCT",
"OREB",
"DREB",
"REB",
"AST",
"TOV",
"STL",
"BLK",
"BLKA",
"PF",
"PFD",
"PTS",
"PLUS_MINUS",
"NBA_FANTASY_PTS",
"DD2",
"TD3",
"WNBA_FANTASY_PTS",
"GP_RANK",
"W_RANK",
"L_RANK",
"W_PCT_RANK",
"MIN_RANK",
"FGM_RANK",
"FGA_RANK",
"FG_PCT_RANK",
"FG3M_RANK",
"FG3A_RANK",
"FG3_PCT_RANK",
"FTM_RANK",
"FTA_RANK",
"FT_PCT_RANK",
"OREB_RANK",
"DREB_RANK",
"REB_RANK",
"AST_RANK",
"TOV_RANK",
"STL_RANK",
"BLK_RANK",
"BLKA_RANK",
"PF_RANK",
"PFD_RANK",
"PTS_RANK",
"PLUS_MINUS_RANK",
"NBA_FANTASY_PTS_RANK",
"DD2_RANK",
"TD3_RANK",
"WNBA_FANTASY_PTS_RANK",
"CFID",
"CFPARAMS"
]
teams_columns_list = [
"TEAM_ID",
"TEAM_NAME",
"GP",
"W",
"L",
"W_PCT",
"MIN",
"FGM",
"FGA",
"FG_PCT",
"FG3M",
"FG3A",
"FG3_PCT",
"FTM",
"FTA",
"FT_PCT",
"OREB",
"DREB",
"REB",
"AST",
"TOV",
"STL",
"BLK",
"BLKA",
"PF",
"PFD",
"PTS",
"PLUS_MINUS",
"GP_RANK",
"W_RANK",
"L_RANK",
"W_PCT_RANK",
"MIN_RANK",
"FGM_RANK",
"FGA_RANK",
"FG_PCT_RANK",
"FG3M_RANK",
"FG3A_RANK",
"FG3_PCT_RANK",
"FTM_RANK",
"FTA_RANK",
"FT_PCT_RANK",
"OREB_RANK",
"DREB_RANK",
"REB_RANK",
"AST_RANK",
"TOV_RANK",
"STL_RANK",
"BLK_RANK",
"BLKA_RANK",
"PF_RANK",
"PFD_RANK",
"PTS_RANK",
"PLUS_MINUS_RANK",
"CFID",
"CFPARAMS"
]
Time to make the dataframe for the nba players and using the columns list above as headers. After making and saving the dataframe, I sampled 5 as an example
#making dataframe process for all nba players and sampling 5
nba_players_df = pd.DataFrame(all_nba_players, columns = players_columns_list)
nba_players_df.sample(5)
| PLAYER_ID | PLAYER_NAME | NICKNAME | TEAM_ID | TEAM_ABBREVIATION | AGE | GP | W | L | W_PCT | ... | PF_RANK | PFD_RANK | PTS_RANK | PLUS_MINUS_RANK | NBA_FANTASY_PTS_RANK | DD2_RANK | TD3_RANK | WNBA_FANTASY_PTS_RANK | CFID | CFPARAMS | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 525 | 1626169 | Stanley Johnson | Stanley | 1610612747 | LAL | 26.0 | 48 | 16 | 32 | 0.333 | ... | 137 | 259 | 309 | 355 | 288 | 268 | 40 | 298 | 5 | 1626169,1610612747 |
| 259 | 201949 | James Johnson | James | 1610612751 | BKN | 35.0 | 62 | 30 | 32 | 0.484 | ... | 62 | 394 | 367 | 340 | 308 | 194 | 40 | 332 | 5 | 201949,1610612751 |
| 163 | 203926 | Doug McDermott | Doug | 1610612759 | SAS | 30.0 | 51 | 21 | 30 | 0.412 | ... | 323 | 375 | 145 | 327 | 280 | 268 | 40 | 237 | 5 | 203926,1610612759 |
| 264 | 1629020 | Jarred Vanderbilt | Jarred | 1610612750 | MIN | 23.0 | 74 | 41 | 33 | 0.554 | ... | 93 | 174 | 303 | 110 | 140 | 68 | 40 | 174 | 5 | 1629020,1610612750 |
| 162 | 1627827 | Dorian Finney-Smith | Dorian | 1610612742 | DAL | 29.0 | 80 | 51 | 29 | 0.638 | ... | 132 | 367 | 153 | 72 | 151 | 194 | 40 | 143 | 5 | 1627827,1610612742 |
5 rows × 68 columns
Same for the team, I made the dataframe and the columns list for team, and then I sampled 5 as an example
#making dataframe process for all nba teams and sampling 5
nba_teams_df = pd.DataFrame(nba_teams, columns = teams_columns_list)
nba_teams_df.sample(5)
| TEAM_ID | TEAM_NAME | GP | W | L | W_PCT | MIN | FGM | FGA | FG_PCT | ... | TOV_RANK | STL_RANK | BLK_RANK | BLKA_RANK | PF_RANK | PFD_RANK | PTS_RANK | PLUS_MINUS_RANK | CFID | CFPARAMS | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1610612738 | Boston Celtics | 82 | 51 | 31 | 0.622 | 48.5 | 40.7 | 87.4 | 0.466 | ... | 13 | 19 | 2 | 11 | 5 | 20 | 12 | 2 | 10 | Boston Celtics |
| 13 | 1610612747 | Los Angeles Lakers | 82 | 33 | 49 | 0.402 | 48.7 | 41.6 | 88.8 | 0.469 | ... | 27 | 11 | 7 | 5 | 21 | 9 | 11 | 22 | 10 | Los Angeles Lakers |
| 14 | 1610612763 | Memphis Grizzlies | 82 | 56 | 26 | 0.683 | 48.2 | 43.5 | 94.4 | 0.461 | ... | 9 | 1 | 1 | 30 | 17 | 14 | 2 | 4 | 10 | Memphis Grizzlies |
| 20 | 1610612760 | Oklahoma City Thunder | 82 | 24 | 58 | 0.293 | 48.3 | 38.3 | 89.1 | 0.430 | ... | 15 | 14 | 16 | 29 | 4 | 30 | 30 | 28 | 10 | Oklahoma City Thunder |
| 18 | 1610612740 | New Orleans Pelicans | 82 | 36 | 46 | 0.439 | 48.2 | 40.2 | 88.0 | 0.457 | ... | 17 | 7 | 26 | 16 | 14 | 4 | 21 | 21 | 10 | New Orleans Pelicans |
5 rows × 56 columns
I noticed there were a lot of things I do not really need like the double doubles and fantasy points. There are some things that are debatable in the columns I have dropped such as MIN (Minutes) and FT_PCT (Free Throw Percentage). To explain myself, for example, I dropped the minutes (MIN) column because the players chosen are considered the top of the game and they are expected to be played more that other players. It makes sense to me to drop the column this way and it makes the data feel stable. Points is a different scenario because the most valuable player is usually the ones that score more and lifts the team.
I tried to keep the headers used neat and only used the stats that most people would check on such as points, turnovers, winning percentage.
#dropping column headers that I do not need for the data
#doing for both players and teams
nba_players_df.drop(columns=[
"NICKNAME",
"FG3_PCT",
"MIN",
"PLAYER_ID",
"TEAM_ABBREVIATION",
"AGE",
"L",
"FGM",
"FGA",
"FG3M",
"FG3A",
"FTM",
"FTA",
"OREB",
"DREB",
"BLKA",
"PF",
"PFD",
"FG_PCT",
"NBA_FANTASY_PTS",
"DD2",
"TD3",
"WNBA_FANTASY_PTS",
"GP_RANK",
"W_PCT_RANK",
"MIN_RANK",
"FT_PCT_RANK",
"REB_RANK",
"PTS_RANK",
"AST_RANK",
"PF_RANK",
"PFD_RANK",
"FT_PCT",
"STL",
"BLK",
"PLUS_MINUS_RANK",
"W_RANK",
"L_RANK",
"FGM_RANK",
"FGA_RANK",
"FG_PCT_RANK",
"FG3M_RANK",
"FG3A_RANK",
"FG3_PCT_RANK",
"FTM_RANK",
"FTA_RANK",
"OREB_RANK",
"DREB_RANK",
"TOV_RANK",
"STL_RANK",
"BLK_RANK",
"BLKA_RANK",
"NBA_FANTASY_PTS_RANK",
"DD2_RANK",
"TD3_RANK",
"WNBA_FANTASY_PTS_RANK",
"CFID",
"CFPARAMS"
], inplace=True)
nba_teams_df.drop(columns = [
"GP",
"W",
"L",
"MIN",
"FGM",
"FGA",
"FG_PCT",
"FG3M",
"FG3A",
"FG3_PCT",
"FTM",
"FTA",
"FT_PCT",
"OREB",
"DREB",
"REB",
"AST",
"TOV",
"STL",
"BLK",
"BLKA",
"PF",
"PFD",
"PTS",
"PLUS_MINUS",
"GP_RANK",
"W_RANK",
"L_RANK",
"MIN_RANK",
"FGM_RANK",
"FGA_RANK",
"FG3M_RANK",
"FG3A_RANK",
"FG3_PCT_RANK",
"FTM_RANK",
"FTA_RANK",
"FT_PCT_RANK",
"OREB_RANK",
"DREB_RANK",
"REB_RANK",
"AST_RANK",
"TOV_RANK",
"STL_RANK",
"BLK_RANK",
"BLKA_RANK",
"PF_RANK",
"PFD_RANK",
"FG_PCT_RANK",
"PTS_RANK",
"CFID",
"CFPARAMS"
], inplace=True)
After cleaning the data, I will check the scraped data to see if there is nothing outside of expected bounds. Players dataframe seems to be ok, PLUS_MINUS is ok to have a negative number because players can sometimes have a negative impact on the court too. TEAM_ID is acceptable because I will not be using TEAM_ID for analysing aspects, is just for the convenience of merging two dataframes.
nba_teams_df.describe().round(1)
| TEAM_ID | W_PCT | W_PCT_RANK | PLUS_MINUS_RANK | |
|---|---|---|---|---|
| count | 3.000000e+01 | 30.0 | 30.0 | 30.0 |
| mean | 1.610613e+09 | 0.5 | 15.2 | 15.5 |
| std | 8.800000e+00 | 0.1 | 9.0 | 8.8 |
| min | 1.610613e+09 | 0.2 | 1.0 | 1.0 |
| 25% | 1.610613e+09 | 0.4 | 6.8 | 8.2 |
| 50% | 1.610613e+09 | 0.5 | 15.0 | 15.5 |
| 75% | 1.610613e+09 | 0.6 | 22.8 | 22.8 |
| max | 1.610613e+09 | 0.8 | 30.0 | 30.0 |
Checking the scraped data for NBA teams to see if there is nothing outside of expected bounds. TEAM_ID is acceptable because I will not be using TEAM_ID for analysing aspects, is just for the convenience of merging 2 dataframes.
nba_players_df.describe().round(1)
| TEAM_ID | GP | W | W_PCT | REB | AST | TOV | PTS | PLUS_MINUS | |
|---|---|---|---|---|---|---|---|---|---|
| count | 6.050000e+02 | 605.0 | 605.0 | 605.0 | 605.0 | 605.0 | 605.0 | 605.0 | 605.0 |
| mean | 1.610613e+09 | 43.0 | 21.7 | 0.5 | 3.4 | 1.9 | 1.0 | 8.2 | -0.7 |
| std | 8.800000e+00 | 25.8 | 15.4 | 0.2 | 2.4 | 1.8 | 0.8 | 6.3 | 3.6 |
| min | 1.610613e+09 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -30.7 |
| 25% | 1.610613e+09 | 17.0 | 8.0 | 0.4 | 1.8 | 0.6 | 0.5 | 3.5 | -2.4 |
| 50% | 1.610613e+09 | 48.0 | 20.0 | 0.5 | 3.0 | 1.2 | 0.8 | 6.9 | -0.5 |
| 75% | 1.610613e+09 | 66.0 | 35.0 | 0.6 | 4.5 | 2.5 | 1.3 | 11.1 | 1.4 |
| max | 1.610613e+09 | 82.0 | 64.0 | 1.0 | 14.7 | 10.8 | 4.5 | 30.6 | 12.0 |
Part 1
This part was the tricky part for this project. I first sort them according to PTS because majority of the MVP candidates are in the top 15. However, one of the MVP candidates is not in the top of the 15 or even top 50 among PTS. The candidate I need was Chris Paul which was sitting at rank 97 for points.
I went to figure out how to slice the index first, and then I reset the index so I can use it to slice and remove the players between Karl Anthony Towns and Chris Paul. It took me a few tries to solve the issue, as I was stuck in trying to figure out why the index sliced was not to where I wanted.
Finally, to confirm I was on the right move, I check the dataframe and I am glad I got what I wanted.
nba_players_df.sort_values("PTS", ascending = False, inplace=True)
nba_players_df.reset_index(drop=True,inplace=True)
nba_players_df.drop(nba_players_df.index[97::], inplace=True)
nba_players_df.drop(nba_players_df.index[15:96], inplace=True)
nba_players_df
| PLAYER_NAME | TEAM_ID | GP | W | W_PCT | REB | AST | TOV | PTS | PLUS_MINUS | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Joel Embiid | 1610612755 | 68 | 45 | 0.662 | 11.7 | 4.2 | 3.1 | 30.6 | 5.4 |
| 1 | LeBron James | 1610612747 | 56 | 25 | 0.446 | 8.2 | 6.2 | 3.5 | 30.3 | -2.1 |
| 2 | Kevin Durant | 1610612751 | 55 | 36 | 0.655 | 7.4 | 6.4 | 3.5 | 29.9 | 4.9 |
| 3 | Giannis Antetokounmpo | 1610612749 | 67 | 45 | 0.672 | 11.6 | 5.8 | 3.3 | 29.9 | 5.9 |
| 4 | Trae Young | 1610612737 | 76 | 40 | 0.526 | 3.7 | 9.7 | 4.0 | 28.4 | 2.1 |
| 5 | Luka Doncic | 1610612742 | 65 | 44 | 0.677 | 9.1 | 8.7 | 4.5 | 28.4 | 2.2 |
| 6 | DeMar DeRozan | 1610612741 | 76 | 43 | 0.566 | 5.2 | 4.9 | 2.4 | 27.9 | 1.0 |
| 7 | Kyrie Irving | 1610612751 | 29 | 14 | 0.483 | 4.4 | 5.8 | 2.5 | 27.4 | 4.2 |
| 8 | Ja Morant | 1610612763 | 57 | 36 | 0.632 | 5.7 | 6.7 | 3.4 | 27.4 | 3.3 |
| 9 | Nikola Jokic | 1610612743 | 74 | 46 | 0.622 | 13.8 | 7.9 | 3.8 | 27.1 | 6.0 |
| 10 | Jayson Tatum | 1610612738 | 76 | 49 | 0.645 | 8.0 | 4.4 | 2.9 | 26.9 | 8.8 |
| 11 | Devin Booker | 1610612756 | 68 | 56 | 0.824 | 5.0 | 4.8 | 2.4 | 26.8 | 6.9 |
| 12 | Donovan Mitchell | 1610612762 | 67 | 41 | 0.612 | 4.2 | 5.3 | 3.0 | 25.9 | 4.1 |
| 13 | Stephen Curry | 1610612744 | 64 | 45 | 0.703 | 5.2 | 6.3 | 3.2 | 25.5 | 8.0 |
| 14 | Karl-Anthony Towns | 1610612750 | 74 | 44 | 0.595 | 9.8 | 3.6 | 3.1 | 24.6 | 3.7 |
| 96 | Chris Paul | 1610612756 | 65 | 53 | 0.815 | 4.4 | 10.8 | 2.4 | 14.7 | 7.1 |
Part 2
The player that is not considered for MVP was Kyrie Irving so I removed his row and reset them. This concludes my steps to getting the players table done.
nba_players_df.drop([7], inplace=True)
nba_players_df.reset_index(drop=True,inplace=True)
nba_players_df
| PLAYER_NAME | TEAM_ID | GP | W | W_PCT | REB | AST | TOV | PTS | PLUS_MINUS | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Joel Embiid | 1610612755 | 68 | 45 | 0.662 | 11.7 | 4.2 | 3.1 | 30.6 | 5.4 |
| 1 | LeBron James | 1610612747 | 56 | 25 | 0.446 | 8.2 | 6.2 | 3.5 | 30.3 | -2.1 |
| 2 | Kevin Durant | 1610612751 | 55 | 36 | 0.655 | 7.4 | 6.4 | 3.5 | 29.9 | 4.9 |
| 3 | Giannis Antetokounmpo | 1610612749 | 67 | 45 | 0.672 | 11.6 | 5.8 | 3.3 | 29.9 | 5.9 |
| 4 | Trae Young | 1610612737 | 76 | 40 | 0.526 | 3.7 | 9.7 | 4.0 | 28.4 | 2.1 |
| 5 | Luka Doncic | 1610612742 | 65 | 44 | 0.677 | 9.1 | 8.7 | 4.5 | 28.4 | 2.2 |
| 6 | DeMar DeRozan | 1610612741 | 76 | 43 | 0.566 | 5.2 | 4.9 | 2.4 | 27.9 | 1.0 |
| 7 | Ja Morant | 1610612763 | 57 | 36 | 0.632 | 5.7 | 6.7 | 3.4 | 27.4 | 3.3 |
| 8 | Nikola Jokic | 1610612743 | 74 | 46 | 0.622 | 13.8 | 7.9 | 3.8 | 27.1 | 6.0 |
| 9 | Jayson Tatum | 1610612738 | 76 | 49 | 0.645 | 8.0 | 4.4 | 2.9 | 26.9 | 8.8 |
| 10 | Devin Booker | 1610612756 | 68 | 56 | 0.824 | 5.0 | 4.8 | 2.4 | 26.8 | 6.9 |
| 11 | Donovan Mitchell | 1610612762 | 67 | 41 | 0.612 | 4.2 | 5.3 | 3.0 | 25.9 | 4.1 |
| 12 | Stephen Curry | 1610612744 | 64 | 45 | 0.703 | 5.2 | 6.3 | 3.2 | 25.5 | 8.0 |
| 13 | Karl-Anthony Towns | 1610612750 | 74 | 44 | 0.595 | 9.8 | 3.6 | 3.1 | 24.6 | 3.7 |
| 14 | Chris Paul | 1610612756 | 65 | 53 | 0.815 | 4.4 | 10.8 | 2.4 | 14.7 | 7.1 |
Cleaning the teams table is way easier than the players table,
I thought about keeping the teams that do not have a MVP candidate but ultimately removed them from the team and reset index. I sort them according to the W_PCT (Winning Percentage) as it feels more accurate.
I did not sort them according to W_PCT_RANK because there are 3 teams with the same position. For PLUS_MINUS_RANK, there seems to be a gap but I think is because of the previous teams that have been removed by index.
nba_teams_df.sort_values("W_PCT", ascending = False, inplace=True)
nba_teams_df.reset_index(drop=True,inplace=True)
nba_teams_df.drop([2, 10, 14, 16, 17, 18, 19, 20, 21, 23, 24, 25, 26, 27, 28, 29], inplace=True)
nba_teams_df.reset_index(drop=True,inplace=True)
nba_teams_df
| TEAM_ID | TEAM_NAME | W_PCT | W_PCT_RANK | PLUS_MINUS_RANK | |
|---|---|---|---|---|---|
| 0 | 1610612756 | Phoenix Suns | 0.780 | 1 | 1 |
| 1 | 1610612763 | Memphis Grizzlies | 0.683 | 2 | 4 |
| 2 | 1610612744 | Golden State Warriors | 0.646 | 3 | 5 |
| 3 | 1610612742 | Dallas Mavericks | 0.634 | 5 | 8 |
| 4 | 1610612738 | Boston Celtics | 0.622 | 6 | 2 |
| 5 | 1610612755 | Philadelphia 76ers | 0.622 | 6 | 10 |
| 6 | 1610612749 | Milwaukee Bucks | 0.622 | 6 | 7 |
| 7 | 1610612762 | Utah Jazz | 0.598 | 9 | 3 |
| 8 | 1610612743 | Denver Nuggets | 0.585 | 10 | 11 |
| 9 | 1610612741 | Chicago Bulls | 0.561 | 12 | 20 |
| 10 | 1610612750 | Minnesota Timberwolves | 0.561 | 12 | 9 |
| 11 | 1610612751 | Brooklyn Nets | 0.537 | 14 | 15 |
| 12 | 1610612737 | Atlanta Hawks | 0.524 | 16 | 14 |
| 13 | 1610612747 | Los Angeles Lakers | 0.402 | 23 | 22 |
There were no illegal values for this dataset, I like it the way it is and is not too confusing.
Data are in the dataframe format that is easy to use for graphs/charts/plots
Key variables such as 'PLUS_MINUS' for players are allowed to have negative values, and all other key variables does not have any out of bound values. Data is not sorted
Exploration shown below
Dataframe is able to generate charts, graphs, plots, easily with help from plotly or matplotlib etc.
Time to analyse the dataset, I will be using Plotly for the graphs. I will credit the link under the reference list at the end.
If the graphs do not run or show when opened, please go to File -> Trust Notebook, usually this step allows the graphs to be loaded. If not, please restart and run it again.
I used a scatter matrix to display the 4 columns of PTS, REB, AST and TOV. I personally wanted to do a barchart with the columns at stacked together but I felt it might be too clustered. Hence, why I decided to go for scatter matrix.
The first analysis I did was to compare the players among points, rebounds, assists and turnovers. What usually stands out for MVPs are their ability to knock down the shots (points), make their team better (assists and rebounds) and not turning the ball over. We can see from the scatter matrix below, Joel Embiid scored the most points, Nikola Jokic secured the most rebounds, Chris Paul dished out the most assists and the one with the fewest turnovers are Demar Derozan and Devin Booker.
For points, we can see majority scored more than 25 points per game, the only candidate to not do so is Chris Paul who averaged 14.7 points per game (ppg). 3 players have a lot of points and rebounds, which is Giannis Antetokounmpo, Joel Embiid and Nikola Jokic. For points and assists, we have more than 4 players averaging more than 7 assists, which are Nikola Jokic, Trae Young, Luka Doncic and Chris Paul. Finally, multiple players have quite a lot of turnovers. This is usually because when opposing teams play defense on these players, they try not to let them score as these players who averaged a lot of points per game are usually the players that can change the momentum of the game. The one with the most turnovers but with most points is Luka Doncic, the players with a lot of points but fewer turnovers is Demar Derozan and Devin Booker, fewer points but low turnovers is Chris Paul.
To conclude this part of the analysis, Nikola Jokic is making a strong statement and is understandable why he won the MVP for the 2021-2022 NBA season. One of the top few averaging a lot of points, rebounds and assists, he is truly a Center doing it all on the court. However, he does have 3 to 4 turnovers, if minimized he would be harder to deal with against any opposing team. Jokic seems to be in the lead for now, in this part of the analysis.
fig = px.scatter_matrix(nba_players_df,
dimensions=["PTS", "REB", "AST", "TOV"],
color="PLAYER_NAME")
fig.show()
The treemap below shows the number of games played and won by the players, as well as the winning percentage and plus/minus. Plus/minus means the positive or negative impact the player have on the team. I chose to have the winning percentage and plus/minus shown in the treemap, and then when you hover over to the boxes of each players, you can see the games played and won.
From this analysis, we can see that Devin Booker has the most wins and the highest winning percentage and the fourth highest plus minus of the group. Trailing behind him is his teammate Chris Paul. The lowest winning percentage and plus/minus, is Lebron James with the lowest winning percentage and a negative plus/minus which means that he does not have a positive impact on his team as a whole, despite averaging 30.3 ppg which is very shocking. I did not expect Lebron James to be this low in this Player Wins analysis.
Conclusion for this part of the analysis, Devin Booker far exceeds majority of the other candidates, though it seems he is slightly above Chris Paul but this is because they are both teammates on the same team. Nikola Jokic is not too far behind with a winning percentage of 0.622 and a positive plus/minus rating of 6. He did play more games than Booker and finished winning 62.1% of the games he played and won (it is calculated by the dividing the games won over the games played, and then multiplying by 100). Nevertheless, Booker had 20.3% more games won and played than Jokic. Booker is also 0.9% more than Chris Paul for the games played and won, though is because Chris Paul played lesser games than Booker.
fig = px.treemap(nba_players_df, path=[px.Constant("NBA"), 'PLAYER_NAME', 'W_PCT', 'PLUS_MINUS'],
values='GP', color='W', color_continuous_scale='RdBu',
title="Games Played and Win")
fig.update_layout(margin = dict(t=50, l=25, r=25, b=25))
fig.show()
I chose a 3D scatter because it covers all the columns for the teams.
Moving to the team wins, the one with the highest winning percentage, plus/minus rank and winning percentage ranks, are the Phoenix Suns. The Suns dominated in this analysis being number 1 for everything. Is truly impressive with how they got number 1 at everything and how they were at least 0.7 while majority had 0.6 in winning percentage. Right behind the Suns was the Grizzlies and the Warriors. The one with the lowest wins and plus/minus as a team are the Los Angeles Lakers. This shows despite having a player with 30.3 ppg, team effort is important.
Nikola Jokic plays for the Denver Nuggets, who currently right at the middle for this plot, they have a slightly better result than the Minnesota Timberwolves.
In conclusion for this analysis, the Suns win this, which means teammates Devin Booker and Chris Paul have contributed a lot to the team.
fig = px.scatter_3d(nba_teams_df, x='PLUS_MINUS_RANK', y='W_PCT_RANK', z='W_PCT',
color='TEAM_NAME', symbol='TEAM_NAME')
fig.show()
For the final part of the analysis, I will be merging the dataframes via the TEAM_ID. After doing so, I will be analysing and comparing the players and teams winning columns. This part allows me to go in depth to the connection between the players and teams, are the MVP candidates part of the reason for the winning success of the team.
I tried to use the index as the merge point but it does not work at all.
#I will use the TEAM_ID as a merge point to show the merged dataframe
players_teams_df = nba_players_df.merge(nba_teams_df, how='inner', on='TEAM_ID')
players_teams_df
| PLAYER_NAME | TEAM_ID | GP | W | W_PCT_x | REB | AST | TOV | PTS | PLUS_MINUS | TEAM_NAME | W_PCT_y | W_PCT_RANK | PLUS_MINUS_RANK | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Joel Embiid | 1610612755 | 68 | 45 | 0.662 | 11.7 | 4.2 | 3.1 | 30.6 | 5.4 | Philadelphia 76ers | 0.622 | 6 | 10 |
| 1 | LeBron James | 1610612747 | 56 | 25 | 0.446 | 8.2 | 6.2 | 3.5 | 30.3 | -2.1 | Los Angeles Lakers | 0.402 | 23 | 22 |
| 2 | Kevin Durant | 1610612751 | 55 | 36 | 0.655 | 7.4 | 6.4 | 3.5 | 29.9 | 4.9 | Brooklyn Nets | 0.537 | 14 | 15 |
| 3 | Giannis Antetokounmpo | 1610612749 | 67 | 45 | 0.672 | 11.6 | 5.8 | 3.3 | 29.9 | 5.9 | Milwaukee Bucks | 0.622 | 6 | 7 |
| 4 | Trae Young | 1610612737 | 76 | 40 | 0.526 | 3.7 | 9.7 | 4.0 | 28.4 | 2.1 | Atlanta Hawks | 0.524 | 16 | 14 |
| 5 | Luka Doncic | 1610612742 | 65 | 44 | 0.677 | 9.1 | 8.7 | 4.5 | 28.4 | 2.2 | Dallas Mavericks | 0.634 | 5 | 8 |
| 6 | DeMar DeRozan | 1610612741 | 76 | 43 | 0.566 | 5.2 | 4.9 | 2.4 | 27.9 | 1.0 | Chicago Bulls | 0.561 | 12 | 20 |
| 7 | Ja Morant | 1610612763 | 57 | 36 | 0.632 | 5.7 | 6.7 | 3.4 | 27.4 | 3.3 | Memphis Grizzlies | 0.683 | 2 | 4 |
| 8 | Nikola Jokic | 1610612743 | 74 | 46 | 0.622 | 13.8 | 7.9 | 3.8 | 27.1 | 6.0 | Denver Nuggets | 0.585 | 10 | 11 |
| 9 | Jayson Tatum | 1610612738 | 76 | 49 | 0.645 | 8.0 | 4.4 | 2.9 | 26.9 | 8.8 | Boston Celtics | 0.622 | 6 | 2 |
| 10 | Devin Booker | 1610612756 | 68 | 56 | 0.824 | 5.0 | 4.8 | 2.4 | 26.8 | 6.9 | Phoenix Suns | 0.780 | 1 | 1 |
| 11 | Chris Paul | 1610612756 | 65 | 53 | 0.815 | 4.4 | 10.8 | 2.4 | 14.7 | 7.1 | Phoenix Suns | 0.780 | 1 | 1 |
| 12 | Donovan Mitchell | 1610612762 | 67 | 41 | 0.612 | 4.2 | 5.3 | 3.0 | 25.9 | 4.1 | Utah Jazz | 0.598 | 9 | 3 |
| 13 | Stephen Curry | 1610612744 | 64 | 45 | 0.703 | 5.2 | 6.3 | 3.2 | 25.5 | 8.0 | Golden State Warriors | 0.646 | 3 | 5 |
| 14 | Karl-Anthony Towns | 1610612750 | 74 | 44 | 0.595 | 9.8 | 3.6 | 3.1 | 24.6 | 3.7 | Minnesota Timberwolves | 0.561 | 12 | 9 |
I chose a 3D scatter because it covers all the winning columns for players and teams. I do not need the games win but more on the winning percentage because the winning percentage (in this case for players, represented by W_PCT_x, because both players and teams dataframes have the same column name), allows me to see if the player was key and responsible for most of the wins.
For this final part of the analysis, is clear that Devin Booker and Chris Paul are very important to the success of the Phoenix Suns. Devin Booker was slightly better than Chris Paul in terms of winning percentage (they are very small and roughly at the top, the size is small due to the size being the PLUS_MINUS_RANK). I tried to change the size to something bigger but their scatter points seems to coincide with each other which makes it even harder to see.
Jokic (current MVP of 2021-2022 NBA Season), helped the Nuggets to the 6th seed in the Western Conference and in this plot, lands in the middle out of all the other teams with their own MVP candidates. I can safely say Booker was definitely valuable to this Suns team, great winning percentage for both player and team.
In conclusion for this analysis, I will give Devin Booker the nod, followed by Chris Paul and Stephen Curry. I thought Jokic will be higher but I guess not.
fig = px.scatter_3d(players_teams_df, x='W_PCT_x', y='W_PCT_RANK', z='W_PCT_y',
color='PLAYER_NAME', size= 'PLUS_MINUS_RANK', symbol='TEAM_NAME',
title="PLAYER TEAM RELATIONSHIP")
fig.show()
From my analysis above, I conclude that Devin Booker should be the MVP for the 2021-2022 NBA Season. In terms of contribution, he is indeed vital and is the leading scorer for this Suns roster. He helped the Suns to the 1st team in the NBA and the 1st team in the Western Conference.
I personally thought, that from my findings, I will get Stephen Curry to be the MVP but this analysis done changed my perspective a lot. I never knew Booker alone had such a winning impact on the Suns and he should deserve more credit for his impact. I hope he wins an MVP one day. I am not saying Nikola Jokic does not deserve the MVP as he did carried his team to the playoffs even though his team got swept in the first round of the playoffs.
The first and second data is from the official NBA website, here is the site terms of use.
Source: https://www.nba.com/termsofuse
Clause 9, part 1:
By using such NBA Statistics, you agree that: (1) any use, display or publication of the NBA Statistics shall include a prominent attribution to NBA.com in connection with such use, display or publication;
Clause 9, Part 2:
(2) the NBA Statistics may only be used, displayed or published for legitimate news reporting or private, non-commercial purposes;
The third dataset from Kaggle, which I did not use, has a CC0: Public Domain license.
I would say that it is possible if the outcome of the analysis does not fit what any fan or researcher is looking for. Nevertheless, the stats does not determine a player right for MVP as there are things you do on the court that are unable to be recorded. An example would be creating a offensive charge to an opposing team player to regain possession of the ball.
The purpose of my analysis is to see from my point of view, the conclusion I got as to who should be the MVP.
The stats is for anyone to view, as long as one does not do anything illegal with it.
The stats will change according to each season so it will be accurate for now. Potential biases would be the referee missing a score or overturning a score with a rule that has not been heard before. An example is Manu Ginóbili scoring a three-pointer (it was intended as an alley-oop) but the ball went in. The referees missed it and overturned the shot.
Candidates for the Regular Season MVP is in this article:
Source: https://www.nba.com/news/kia-mvp-ladder-april-15-edition
Extracting NBA stats
Source: https://github.com/rd11490/NBA_Tutorials/tree/master/finding_endpoints
Plotly
Source: https://plotly.com/python/